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Abstract 

In this paper, we introduce two new features for the design of electroencephalography (EEG) based 
Brain-Computer Interfaces (BCI): one feature based on multifractal cumulants, and one feature based 
on the predictive complexity of the EEG time series. The multifractal cumulants feature measures 
the signal regularity, while the predictive complexity measures the difficulty to predict the future of 
the signal based on its past, hence a degree of how complex it is. We have conducted an evaluation 
of the performance of these two novel features on EEG data corresponding to motor-imagery. We 
also compared them to the most successful features used in the BCI field, namely the Band-Power 
features. We evaluated these three kinds of features and their combinations on EEG signals from 
13 subjects. Results obtained show that our novel features can lead to BCI designs with improved 
classification performance, notably when using and combining the three kinds of feature (band-power, 
multifractal cumulants, predictive complexity) together. 

1 Introduction 

Brain-Computer Interfaces (BCI) are communication systems that enable users to send commands to a 
computer by using only their brain activity |27| , this activity being generally measured using ElectroEn- 
cephaloGraphy (EEG). Most EEG-based BCI are designed around a pattern recognition approach: In 
a first step features describing the relevant information embedded in the EEG signals are extracted [4]. 
They are then feed into a classifier which identifies the class of the mental state from these features [T7] . 
Therefore, the efficiency of a BCI, in terms of recognition rate, depends mostly on the choice of appro- 
priate features and classifiers. Despite the large number of features that have been explored to design 
BCI U, the performances of current EEG-based BCI are still not satisfactory, and the BCI community 
has stressed the need to further explore and design alternative features [19] . 

In this paper, we focus on feature extraction from EEG signals for the design of BCI based on motor 
imagery (MI) (55]. MI corresponds to the imagination of limb movements (e.g., hand or feet) without 
actual output. It is known to trigger brain signals variations in specific frequency bands that can be used 
to drive a BCI The contribution of this paper is two-fold. First, we introduce two new features for 
MI classification in EEG-based BCI. These two new features are based on: 1. multifractal cumulants 
|26| and 2. predictive complexity [5]. Second, we perform systematic comparisons and analysis of the 
performance of these two new features, together with the most successful feature for motor-imagery 
based-BCI, namely, band-power feature |13) . 

The first new feature, namely, multifractal cumulants, can be seen as a statistic on inter-frequency 
band relations. This is particularly relevant for BCI as this information is generally ignored in current 
motor imagery-based BCI designs, mostly based on the sole power in different frequency bands. It should 
be mentioned that a preliminary study of this kind of feature has been presented in a conference paper 
[S] . The second new feature is based on the statistical complexity and predictive properties of the time 
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series [5]. The information (quantified in bits) that is extracted this way measures how difficult it is to 
make an optimal prediction based on past information. It is null both for totally ordered and totally 
random systems, and increases in between. It has already been applied to single simulated neurons [7] 
and a related form was applied to measure synchronisation in the brain |15| . The assumption is that 
performing a mental task (e.g., motor imagery) makes the EEG signal either more or less predictable, 
which can be detected by a classifier when quantified by this second new feature. 

The remainder of this paper is organised as follows: Sections [5] and [3] present the two new features 
that we propose. Section 0] presents the experimental evaluation, including the data sets that were used. 
The results are then discussed. Section [5] concludes this study. 

The code used for all the experiments in this paper is provided as Free/Libre software. The data used 
is available online and all the presented results are reproducible independently. 

2 Multifractal cumulants 

The multifractal formalism is described in details in [3T] . This section presents a short overview for 
the needs of this document. 

Intuitively, the multifractal cumulants of the signal capture a signature of inter-band relations (see 
below). This contrasts to the power in each frequency band that is generally used. As shown in [S] 
the multifractal spectrum can in itself be used for EEG classification. When considering multifractal in 
addition to power band feature vectors, the resulting combination may improve the classification accuracy. 

The method we chose for extracting the multifractal spectrum is a discrete wavelet transform of the 
signal, out of which we extract the wavelet leader coefficients [2]. Following the directions of [26] we then 
use the cumulants of the leaders as the features for classification, unlike what we previously did in [5]. 

Let x{t) be the signal to analyse. One view on multifractal analysis [TS] is to relate the statistical 
properties of x{t) and of a scaled version of it x{at). In terms of frequency analysis, that scaling in time 
corresponds to a frequency shift. Hence, another view of the multifractal cumulants feature is that they 
characterise some form of inter-frequency information, as mentioned in the introduction of this section. 

More precisely: 

• The signal x{t) is decomposed using a Discrete Wavelet Transform to get the wavelet coefficients 
w(s,ts) at each dyadic scale s and time interval is- 

• The wavelet leaders at each scale s are then extracted by computing the maxima of the wavelet 
coefficients over all samples involved for computing z/;(s,ts — 1), w{s^ts) and w{s,ts + 1) (including 
lower scales) [5]. 

• Instead of performing a Legendre transform, or a direct Holder exponent density estimation as in 
[8], we use here the recent technique introduced by [26j and compute the wavelet leader cumulants 
of orders 1 to 5. As noted in [55] the first few cumulants already contain most of the information 
useful in practice for characterising the distribution of the Holder exponents. For a classification 
task this information can now be exploited in a more condensed form. 

• The 5 first cumulants are computed for the leaders at each scale s. Considering there is at most L 
levels of wavelet transform in a signal of size between 2^ and 2^~^^, we get a total of 5 x L cumulants 
for the signal, that progressively encompass more and more frequency bands as the scale increases. 
These 5 x L cumulants per channel are used as the feature vector. 

This method can be quite sensitive to the presence of electromagnetic interferences at 50 Hz. We thus 
pre- filter the signals as described in section HTTj before proceeding to the multifractal cumulants estimation. 

3 Predictive complexity measure 

This paper introduces for the first time the predictive complexity measure of [5] in the context of EEG 
classification. 

The intuitive idea behind this feature is to quantify the amount of information that is necessary to 
retain from the past of the series in order to be able to predict optimally the future of the series (|24|. 
and see below for the optimality criterion). 

We had indication from related previous works that the feature could be relevant for EEGs: 
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• At the level of neurons: statistical complexity was used to describe the computational structure 
of spike trains |12| . It was also shown to decrease while the neurons are learning in an artificial 
spiking neural network [?]• 

• At the level of the brain: information coherence and synchronisation mechanism between com- 
munities of neurons are presented in [15) . relying on related techniques. 

3.1 Decisional states and the corresponding complexity measure 

Informally, the idea behind the decisional states is to construct a Markovian automaton [TTl [51] whose 
states correspond to taking the same decisions j^, according to a user-defined utility function. These 
decisions are those that one can take based on predictions of the future and their expected utility. 

The complexity of the series is then computed as the mutual information between the internal states 
of the Markovian automaton, and the series itself. The complexity is null for a very regular series, for 
example a constant series or a series where we always take the same decision: there is only a single state 
in the automaton. Similarly the complexity of a completely random series is also null: it can be modelled 
by successive independent draws from a fixed probability distribution, whose expected utility we take to 
make our decision. This leads again to a single Markovian automaton state, hence a null complexity. 
The complexity measure increases only for more complicated series with many internal states (i.e. many 
distinct probability distributions of what happens next, depending on what previously happened, leading 
to different decisions). 

Presumably when the EEG corresponds or not to some functional activity, the complexity of the series 
should change. The idea is to plug machine learning techniques for monitoring that change. 

Formal description 

Formally, let (sf) be a time series, with t the time index. Let Sj °° = {su)u<t 

and s+°° = {su)t<u 

be the past and future histories at time t. In practice and for real measures, the time range is finite: 
< t < T. Similarly we measure the past and future histories with finite horizons: s^^ = (sm)(_^<„<( 

and = {su)t<u<t+k- 

Let us now consider predicting the future from the past statistically: we seek to determine P (s^'""|s^'') 
at each time t (possibly with h or k infinite). Assuming the system is conditionally stationary, that 
distribution does not change through time: the same causes produce the same consequences. Let then 
S~'^ = {s^''}yj and S'^'^ = {st''}\ft ^^^^ P^^^ ^'^"-^ future histories. We will drop the time 

indices from now on to indicate the time shift invariance. 

The causal states C, are defined as the equivalence classes of past histories with the same conditional 
distribution of futures: C = {x £ S~'^ : P(s+'^|a;) = P (s+'^|s~'') } = |a; : x = s~^^. Knowing 

the causal state at the current time is the minimal information needed for making optimal predictions 
|24| using the full conditional probability distribution. 

In the discrete case the series are strings of symbols drawn from an alphabet A. Each time step implies 
a symbol transition, which possibly leads to a different causal state. The corresponding automaton is 
called the e-machine 

Let us now introduce a utility function u : (5'+'^)^ i— > R, such that u (r+'^,s+'^) quantifies the gain 
(positive) or loss (negative) when the user relied on the prediction r^'^ while 5+*^ actually happened. 
We can now define an expected utility: U [r+'"'|s^'''] = Eg+tg^+fc [u (r^'^, s^*^) |s^''], quantifying what 
utility can be expected on average when choosing the prediction r+'"" for the current system state s"''. 
The set of optimal predictions, realising the maximal expected utility, can now be defined as Y (s"'') = 
argmax^+fcg5+fcU [r+'^js"'']. 

p u cl 

The following equivalence relations =, =, and = naturally extend the causal state equivalence relation 
=, taking into account the utility function: 

• = s~^' when Y {r~^^ = y(s~''), with the corresponding iso-prediction sets as equivalence 
classes. All past histories within the same class lead to choosing the same predictions, even though 
the expected utility may change from one past history to the other. 

• r~'^ = s"'' when max^+kg^+t U [r+'"'|r~''] = max^+k^g+k U [r+'^|s~''] , with the corresponding iso- 
utility sets as equivalence classes. All past histories within the same class lead to the same maximum 



3 



expected utility, even though the optimal predictions to choose for reaching this utility are not 
specified. 

• r"'' = s^'' when r^'* = s^'* and r^^ = s^''. The intersection of the above iso-utility and iso- 
prediction sets define the decisional states: ^'(.s"'^) = |a- G : x = s^'', x = s~''|. We assume 
that when both the maximal expected utility and the optimal predictions are the same, the user 
will reach the same decisions. In other words, the utility function encodes all the user needs to 
know to reach a decision. 

It can be easily shown |5] that the causal states sub-partition the decisional states. That is to say, 
the causal states have lost their minimality property due to the fact we are not interested in the full 
conditional distribution of futures but only in the optimal decisions with respect to a user-defined utility 
function. 

The causal states are an intrinsic property of the data set. The mutual information Mc = I{s~^; C {^~'^)) 
between the causal C (s"'') state and the series of defines an intrinsic measure, the statistical com- 
plexity [Tl], quantifying how hard it is to get the conditional distribution of futures from the current 
observed past. 

The decisional states correspond to the structure implied by the user utility function on top of the 
causal states. As for the causal states, knowing the decisional state at the current time is the inform- 
ation needed for making optimal predictions maximising the user-defined utility function. The mutual 
information Aid — /(s~'';^'(s~'')) similarly defines a complexity measure for how hard it is to make 
these predictions, called decisional complexity by analogy with the statistical complexity. 

3.2 Application to EEG data 

In the present study the chosen utility function is the negative sum of square error: M(r+'^,s+'^) = 

Each EEG series is split in time windows of ft, -I- A; samples, defining each a [s^^, 5+*^) pair. These 
observations are fed in the reconstruction algorithm presented in [5] . That algorithm estimates the joint 
probabilities p (s"'', 5+*^) using a kernel density estimation with Gaussian kernels. The conditional prob- 
abilities are then computed by integration over 5+*^: p (s~^'^\s^'^) = p [s^'^, 3+*^) / J^+k^g+k P (s^'*, cr+'') . 
These are then clustered into causal state estimates C (s^'*) = {x G S^'^ : p (s+'^|x) = p (s+'^|s^'*)} us- 
ing connected components up to a fixed Bhattacharyya distance threshold (chosen to be 0.05) between 
the conditional distributions. The utility function is applied on top of the aggregated causal states dis- 
tributions p (s+'^IC) ~ avgg-hg^p (s+'^js^'') in order to get the expected utilities U . Finally, the 
causal states are themselves clustered into iso-utility and iso-prediction sets, which are intersected to get 
the decisional states w. 

The decisional complexity is the feature that is extracted from the EEG series. We thus get one 
feature per electrode. 

4 Evaluation 
4.1 Data sets 

Four data sets for a total of 13 subjects were used for evaluation in this study. Data for 12 subjects 
come from the international BGI competition^ II. Ill, and IV HKH. The data for the last subject 
was acquired at INRIA (French National Research Institute on Computer Science and Control) Rennes- 
Bretagne Atlantique, using the OpenViBE software platform |23pl . 

4.1.1 BCI competition II, data set III (1 subject) 

This data set was captured at the Department of Medical Informatics, Institute for Biomedical Engin- 
eering, University of Technology Graz [T]. The data contains 280 trials sampled at 128 Hz. During each 
trial, the subject is presented with a visual cue indicating either left or right at random, and shall then 
imagine a movement of the corresponding hand during 6s. There was 140 trials of left class (left hand 
motor imagery) and 140 trials of the right class (right hand motor imagery). EEG were recorded using 

^ http: / /www. bbci.de / competition / 
^http: / /openvibe.inria.fr/?q=datasets 
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the C3, C4 and Cz electrodes, however, for the purpose of this evaluation, we used only the C3 and C4 
electrodes as recommended in [T^]. More details about this data set can be found in [S]. 

The data was already preprocessed by a band-pass filter between 0.5 and 30 Hz. Unfortunately, the 
nature of the filter is not specified, and the DC component was not well removed. Since this interferes 
in particular with the algorithm for estimating the series complexity, a new filter is applied with the 
following characteristics: 

• FIR fiher obtained using METEOR gS] 

• Less than IdB change in 4-30Hz with a linear phase response 

• At least -50dB at Hz (and above 40 Hz, though in the present case the signal is already filtered) 

• 1/4 sec delay 

We then used all the available filtered data over the feedback period in order to extract the features. 

4.1.2 BCI competition III, data set Illb (2 subjects) 

This data set was also captured at the Department of Medical Informatics, Institute for Biomedical 
Engineering, University of Technology Graz [T]. It originally consists of EEG signals from 3 subjects who 
performed left hand and right hand motor imagery. However, for the purpose of this study, only subjects 
labeled S4 and Xll were used. Indeed, EEG signals for subject 03VR were recorded using a difi'erent 
protocol and the data file provided online contained erroneously duplicate sig nalfl. The experimental 
protocol for subjects S4 and Xll is similar to the one used for the BCI competition II data set HI. The 
data was captured at a 125 Hz sampling rate using electrodes C3 and C4. More details about this data 
set can be found in [5]. EEG signals in this data set were already band-pass filtered in 0.5-30 Hz. As 
for the previous data set, an additional FIR filter with the same characteristics as the previous one was 
applied, for the same reasons. We also used all the available filtered data over the feedback period in 
order to extract the features. 

4.1.3 BCI competition IV, data set lib (9 subjects) 

This third data set was provided by the same team and follows the same experimental protocol as the 
above two. However it contains occular artifacts that interfere with the brain signals. Although this is 
less convenient for the purpose of testing our two new features it is also more realistic. We thus choosed 
to ignore the artifacts and process as if the signals were clean EEGs, in order to assert the robustness of 
our features to the occular artifacts. 

Data coming from nine subjets is sampled at 250 Hz, and pre-processed by a band-pass filter between 
0.5 and 100 Hz with a notch at 50 Hz. For the aforementioned reasons we had to re-filter the signals. 
We choosed a FIR design with a band-pass between 6 and 30 Hz, with a null at OHz to suppress the DC 
component and -50dB above 40 Hz. 

We also used all the available filtered data over the feedback period in order to extract the features. 

4.1.4 OpenViBE / INRIA data (1 subject) 

This data set was captured at INRIA Rennes-Bretagne Atlantique using the OpenViBE free and open- 
source software [23] • This data set comprises EEG signals from one subject who performed left hand and 
right hand motor imagery. 560 trials of motor imagery (280 trials per class) were recorded over a 2 week 
period. Data were collected using the same experimental protocol as the one used for the BCI competition 
data, i.e., the Graz BCI protocol [21]. Half of the trials (randomly selected from all experiments over 
time) is used for training, the remaining half for testing. 

EEG data was sampled at 512 Hz and recorded using the following electrodes: C3, C4, FC3, FC4, C5, 
CI, C2, C6, CP3, CP4, with a nose reference electrode. The use of such electrodes enables us to apply 
a discrete Laplacian spatial filter [14] over C3 and C4 in order to obtain better signals, as recommended 
in EH]- 

The data was preprocessed by a FIR filter (designed with METEOR [1^) with the following charac- 
teristics: 

• Less than IdB change in 4-30Hz, linear phase response in this range. 

3See [http:// www.bbci.de/competition/iii/desc_IIIb_ps.htmllfor details 
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• At least -50dB at Hz and above 50 Hz. 

• Null responses at 50 Hz and all harmonics. 

• 1/4 sec delay. 

We then used all the available filtered data over the feedback period in order to extract the features. 

4.2 Results obtained with all features and their combinations 

The goal of this section is to show what results can be obtained with each feature considered in isolation, 
and the effect of combining them. The hypothesis we wish to test is that each feature extracts a different 
information from the signal, and thus that combining them can improve the classification accuracy. 

4.2.1 Features 

For the experiments below, the following methodology was applied: 

• The power in frequency bands was extracted for each subject. As was shown by |13) and in our 
own forthcoming study |10| estimating this information from data is sensitive to choice of the time- 
frequency decomposition algorithm. We extracted the power information in each frequency band 
using a Morlet Wavelet which support is determined by cross-validation. All bands were then kept 
between 4 and 30 hertz, for each of the C3 and C4 electrode: this leads to a 52-dimensional feature 
vector. 

• The multifractal cumulants feature was extracted according to the method described in Section [51 
Cross-validation on the training set was used to determine the wavelet support and the number of 
decomposition levels to retain. Since we have 5 cumulants per level per electrode, this leads to a 
10 * A'^-dimensional vector with N the number of retained levels. 

• Parameters for estimating the predictive complexity feature were also determined by maximising 
the cross-validation performance on the training set: the number of points to retain from the 
past and the future, the sub-sampling factor for the series, the kernel size used for estimating the 
conditional probability distributions, and whether to work on the raw series or the first differences. 
Cross-validation selected 1 point in the future, 5 points from the past with a sub-sampling factor 
of 8 for the OpenViBE subject, and 6 points from the past with a sub-sampling factor of 2 for the 
BCI H and HI subjets. These sub-sampling parameters correspond to having a sampling frequency 
of 62.5 Hz for the S4 and Xll subjects and 64 Hz for the others, which thanks to Nyquist theorem 
matches the filtering operation that was performed on the signals: cross-validation selected the 
most economical sub-sampling parameter that still captures the remaining frequencies in the signal 
between 4 and 30 Hz. We thus saved computational time for the BCI IV subjets by fixing the 
subsampling parameters and cross-validating only the kernel width. All sub-sampled signals (ex: 
all 8 possible series for a sub-sampling by factor 8) were kept for building the statistics in the 
complexity computations. 

Therefore the feature vectors include: 

• 52 power features for frequencies between 4 and 30 Hz. (26 for each C3 / C4 electrode) 

• 30; 40, or 50 features for multifractal cumulants. 

• 2 features for the predictive complexity (one for each electrode). 

As a side remark we shall point out that cross-validating all the above parameters required a quite con- 
sequent computational power, which was obtained using the BOINC distributed computing infrastructure 
[3]. Nevertheless, once the optimal parameters were determined off-line by cross-validation, the compu- 
tational requirements for extracting the features are nothing exceptional: the features can be computed 
in real time on a standard PC. 
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Legend: Green when the classification accuracy of the combination is higher by at least 1% than when using only 
the usual Band-Power feature, blue when the change in performance with the combination is between -1% and 
1%, orange when the combination is worse. 

Table 1: Classification accuracies over the test sets for each subject and feature 

4.2.2 Classifier and combination rule 

In order to classify ttie extracted features, we used a Linear Discriminant Analysis (LDA), one of the 
most popular and efficient classifier for EEG-based BCI |17j . 

Each series was classified independently into either the "left hand" or "right hand" class of the motor 
imagery task using each feature independently. The percentage of correct classifications on the test set 
is reported for each subjet in the first 3 colums of Table [TJ 

We also evaluated the accuracy obtained when combining these three features together. To do so, 
we first estimate the global accuracy of each feature alone by using the Fisher discriminant ratio F on 
the training data set, hence we get i^P°^, ^mfa^ ^cpx fQj. ^j^g three features Band-Power, Multifractal 
cumulants, and Complexity. In order to combine the results we gather the outputs of the 3 linear 
classifiers trained independently on each feature by performing a simple arithmetic average for each 
instance p™"*' = FP°* * Pf""' + F'^^'' * P^"^^"" + F^p"" * P^p'' where each Pi is a prediction (classifier 
output) for the instance number i. That value is tlren used for classifying the instance in the results 
reported in the fourth column of table [T] 

4.2.3 Results 

The multifractal cumulants and the predictive complexity features allow by themselves to classify the 
BCI data sets with a good accuracy. This, in itself, is a contribution of this paper. Although the accuracy 
obtained using either the Multifractal Cumulants or the Predictive Complexity is slightly lower on average 
that when using only the Power Bands feature (sometimes higher), we have extracted some information 
from the signal that is adequate for classification. 

Results also showed that combining all features is better than using the Power Bands alone for 6 
subjects, on the same order for 6 subjects, and with a negative impact for only one subject. A gain of 
1.4% accuracy was obtained on average, with much higher values for some subjects (see table [1]). This 
not only confirms that some different information was extracted from the signal, but also stresses the 
usefulness of our new features for BCI classification tasks. 

5 Conclusion 

This study has introduced two new features for Brain-Computer Interface design: multifractal cumulants 
and predictive complexity. The information contained in the multifractal cumulants feature corresponds 
to a relation between frequency bands, rather than the power extracted in each band. The complexity 
feature measures the difficulty to predict the future of the EEC signals based on their past. 

Interestingly enough, our results showed that the two new features, i.e., multifractal cumulants and 
predictive complexity measure, could indeed be used by themselves to discriminate between different 
motor imagery mental states as measured by EEC This is especially interesting as the complexity feature 
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only adds one scalar value per electrode, compared to the other features we considered (ex: one dimension 
per frequency band in the band-power case). 

Moreover, our results showed that when combining these two features together with band-power 
features, the resulting BCI could reach a better classification accuracy than when using the usual band- 
power feature alone. Thus this suggests that these new features are a good complement to currently used 
features for BCI design and that they can lead to improved BCI design. Therefore we would recommend 
BCI designers to consider these two features as additional features in the conception of a BCI system, in 
order to obtain better performance. 

Future work might involve the exploration of novel ways to combine the features and feature selection, 
as well as the application of these features to BCI tasks other than motor imagery. Work is also needed on 
the design of novel algorithms including physiologically relevant error functions for EEC signal predictions 
for the complexity feature. 



Appendix: Source code, data, and web 
information 

All the results in this study are reproducible independently. The code for the experiments is provided as 
Free/Libre software on the main author web site: 



http: / / nicolas.brodu.numerimoire.net /en/ recherche/publications ^ 



The data that was used can be downloaded on the BCI competitions website and on the OpenViBE 
project page: 



http: / / www.bbci.de/competition/ 



http: / / openvibe.inria.fr / ?q=datasets 
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